Regression for Linguists
  • D. Palleschi
  1. Overview
  2. Resources and Set-up
  • Overview
    • Course overview
    • Syllabus
    • Resources and Set-up
  • Day 1: Simple linear regression
    • 1  Understanding straight lines
    • 2  Simple linear regression
    • 3  Continuous predictors
  • Day 2: Multiple regression
    • 4  Multiple Regression
    • 5  Categorical predictors
  • Day 3: Logistic regression
    • 6  Logistic regression
  • Day 4: Mixed models I
  • Day 4: Mixed models II
  • Day 5: TBD

Inhaltsverzeichnis

  • Resources
  • Assumptions about you
  • Software
    • Install R
    • Install RStudio
    • Install LaTeX
  • resources
    • Troubleshooting (EN: Troubleshooting)

Resources and Set-up

Autor:in
Zugehörigkeit

Daniela Palleschi

Humboldt-Universität zu Berlin

Veröffentlichungsdatum

5. Oktober 2023

Resources

This course is mainly based on Winter (2019), which is an excellent introduction into regression for linguists. For even more introductory tutorials, I recommend going through Winter (2013) and Winter (2014) For a more intermediate textbook, I’d recommend Sonderegger (o. J.).

If you’re interested in the foundational writings on the topic of (frequentist) linear mixed models in (psycho)linguistic research, I’d recommend reading Baayen (2008); Baayen et al. (2008);Barr et al. (2013); Bates et al. (2015); Jaeger (2008); Matuschek et al. (2017); Vasishth (2022); Vasishth & Nicenboim (2016).

Assumptions about you

For this course, I assume that you are familiar with more classical statistical tests, such as the t-test, Chi-square test, etc. I also assume you are familiar with measures of central tendency (mean, median, mode) measures dispersion/spread (standard deviation), and with the concept of a normal distribution. Lacking this knowledge will not impeded your progress in the course, but is an important foundation on which we’ll be building. We can review these concepts in-class as needed.

Software

  • R: a statistical programming language (the underlying language)

  • RStudio: an program that facilitates working with R; our preferred IDE integrated development environment

  • LaTeX: a typesetting system that generates documents in PDF format

  • why R?

    • R and RStudio are open-source and free software
    • they are widely used in science and business

Install R

  • we need the free and open source statistical software R to analyze our data
  • download and install R: https://www.r-project.org

Install RStudio

  • we need RStudio to work with R more easily
  • Download and install RStudio: https://rstudio.com
  • it can be helpful to keep English as language in RStudio
    • we will find more helpful information if we search error messages in English on the internet
  • If you have problems installing R or RStudio, check out this help page (in German): http://methods-berlin.com/wp-content/uploads/Installation.html

Install LaTeX

  • we will not work with LaTeX directly, but it is needed in the background
  • Download and install LaTeX: https://www.latex-project.org/get/

resources

  • many aspects of this course are inspired by (nordmann_applied_2022?) and (wickham_r_nodate?)
    • both freely available online (in English)
  • for German-language resources, visit the website of Methodengruppe Berlin

Troubleshooting (EN: Troubleshooting)

  • Error messages are very common in programming, at all levels.
  • How to find solutions for these error messages is an art in itself
  • Google is your friend! If possible, google in English to get more information

References

Baayen, R. H. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics using R.
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005
Baayen, R. H., & Shafaei-Bajestan, E. (2019). languageR: Analyzing linguistic data: A practical introduction to statistics. https://CRAN.R-project.org/package=languageR
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001
Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious Mixed Models. arXiv Preprint, 1–27. https://doi.org/10.48550/arXiv.1506.04967
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59(4), 434–446. https://doi.org/10.1016/j.jml.2007.11.007
Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., & Makowski, D. (2021). performance: An R package for assessment, comparison and testing of statistical models. Journal of Open Source Software, 6(60), 3139. https://doi.org/10.21105/joss.03139
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. https://doi.org/10.1016/j.jml.2017.01.001
Sonderegger, M. (n.d.). Regression Modeling for Linguistic Data.
Sonderegger, M. (2023). Regression Modeling for Linguistic Data.
Vasishth, S. (2022). Some right ways to analyze (psycho)linguistic data [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/y54va
Vasishth, S., & Nicenboim, B. (2016). Statistical methods for linguistic research: Foundational Ideas. Language and Linguistics Compass, 10(11), 591–613. https://doi.org/10.1111/lnc3.12207
Winter, B. (2013). Linear models and linear mixed effects models in R: Tutorial 1.
Winter, B. (2014). A very basic tutorial for performing linear mixed effects analyses (Tutorial 2).
Winter, B. (2019). Statistics for Linguists: An Introduction Using R. In Statistics for Linguists: An Introduction Using R. Routledge. https://doi.org/10.4324/9781315165547
Syllabus
1  Understanding straight lines
Quellcode
---
author: "Daniela Palleschi"
institute: Humboldt-Universität zu Berlin
# footer: "Lecture 1.1 - R und RStudio"
lang: de
date: "`r Sys.Date()`"
format:
  html:
    number-sections: true
    number-depth: 3
    toc: true
    code-overflow: wrap
    code-tools: true
    self-contained: true
    fig-width: 6
bibliography: references.bib
csl: apa.csl
execute: 
  eval: true # evaluate chunks
  echo: true # 'print code chunk?'
  message: false # 'print messages (e.g., warnings)?'
  error: true # ignore errors when rendering?
  warning: false
---

# Resources and Set-up {.unnumbered}

```{r, eval = T, cache = F}
#| echo: false
# Create references.json file based on the citations in this script
# make sure you have 'bibliography: references.json' in the YAML
```

# Resources

This course is mainly based on @winter_statistics_2019, which is an excellent introduction into regression for linguists. For even more introductory tutorials, I recommend going through @winter_linear_2013 and @winter_very_2014 For a more intermediate textbook, I'd recommend @sonderegger_regression_2023.

If you're interested in the foundational writings on the topic of (frequentist) linear mixed models in (psycho)linguistic research, I'd recommend reading @baayen_analyzing_2008; @baayen_mixed-effects_2008;@barr_random_2013-1; @bates_parsimonious_2015; @jaeger_categorical_2008; @matuschek_balancing_2017; @vasishth_right_2022-1; @vasishth_statistical_2016.
    
# Assumptions about you

For this course, I assume that you are familiar with more classical statistical tests, such as the t-test, Chi-square test, etc. I also assume you are familiar with measures of central tendency (mean, median, mode) measures dispersion/spread (standard deviation), and with the concept of a normal distribution. Lacking this knowledge will not impeded your progress in the course, but is an important foundation on which we'll be building. We can review these concepts in-class as needed.

# Software {#sec-software}

- R: a statistical programming language (the underlying language)
- RStudio: an program that facilitates working with R; our preferred IDE integrated development environment
- LaTeX: a typesetting system that generates documents in PDF format

- why R?
  -  R and RStudio are open-source and free software
  -  they are widely used in science and business

::: {.content-hidden when-format="pdf"}
::: {.column width="30%"}
```{r eval = F, fig.env = "figure", out.width="50%", fig.align = "center"}
#| echo: false

magick::image_read(here::here("media/R_logo.png"))
```
:::

::: {.column width="30%"}
```{r eval =F , fig.env = "figure", out.width="75%", fig.align = "center"}
#| echo: false

magick::image_read(here::here("./media/RStudio_logo.png"))
```
:::
:::

```{r eval = F, fig.env = "figure", out.width="75%", fig.align = "center"}
#| echo: false

magick::image_read(here::here("./media/LaTeX_logo.png"))
```


::: {.content-visible when-format="pdf"}
```{r eval = F, fig.env = "figure", fig.pos="H", out.width="75%", fig.align = "center"}
#| echo: false

R <- grid::rasterGrob(as.raster(png::readPNG(here::here("./media", "R_logo.png"))))

RStudio <- grid::rasterGrob(as.raster(png::readPNG(here::here("./media", "RStudio_logo.png"))))

latex <- grid::rasterGrob(as.raster(png::readPNG(here::here("./media", "LaTeX_logo2.png"))))

gridExtra::grid.arrange(R, NULL, RStudio, NULL, latex, ncol=5,
                        widths=c(.25,.125,.25,.125,.25))
```
:::

## Install R

- we need the free and open source statistical software R to analyze our data
- download and install R: <https://www.r-project.org>

## Install RStudio

- we need RStudio to work with R more easily
- Download and install RStudio: <https://rstudio.com>
- it can be helpful to keep English as language in RStudio
    - we will find more helpful information if we search error messages in English on the internet

- If you have problems installing R or RStudio, check out this help page (in German): <http://methods-berlin.com/wp-content/uploads/Installation.html>

## Install LaTeX

- we will not work with LaTeX directly, but it is needed in the background
- Download and install LaTeX: <https://www.latex-project.org/get/>

# resources

- many aspects of this course are inspired by @nordmann_applied_2022 and @wickham_r_nodate
    - both freely available online (in English)
- for German-language resources, visit the website of [Methodengruppe Berlin](http://methods-berlin.com/de/r-lernplattform/)

## Troubleshooting (EN: Troubleshooting)

- Error messages are very common in programming, at all levels.
- How to find solutions for these error messages is an art in itself
- Google is your friend! If possible, google in English to get more information

# References {.unlisted .unnumbered visibility="uncounted"}

::: {#refs custom-style="Bibliography"}
:::